Merge main back to flash_attn #1

yubofredwang · 2025-12-29T04:50:28Z

Motivation

Fix conflict so sgl-project#314 can be merged

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

* unified benchmark scripts * polish

Co-authored-by: baojiangnan <baojiangnan@kuaishou.com>

add sglang args in gen hidden states polih polish polish

* Add subset options for opc * lint & cat datasets

* fixed non-runnable examples * polish * polish

* fix missing import * fix-args-type

* added tests for scripts * added tests for scripts * polish * polish * polish * polish * polish * polish * polish * added tests for scripts * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish

* support more sampling params * remove recommended * some comments * lint

* Add examples of qwen3-coder-30B-A3B training script * tiny fix * Remove WANDB API key export from script to align with other examples Removed Weights & Biases configuration from script.

* support checkpoint * lint * capture only required hidden states * revert regen * fix llama * backward compatible * Update specforge/modeling/target/custom_backend/qwen3_moe.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * gemini suggests * fix * fix phi --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* updated benchmark docs * polish * polish

* grouped args for better reference * grouped args for better reference

* feature: optimize online training use hf backend less GPU memory polish polish * polish

* added model-download-dir * polish

* add missing * Update specforge/modeling/target/custom_backend/qwen3_moe.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fixed kv head replication in qwen3 moe * poliosh * poliosh

* docs:add benchmark refer polish polish * polish

* optimized sglang backend memory usage * poliosh

…ng (#378) * feat: add training support for DeepSeek-v3 EAGLE-3 speculative decoding Co-authored-by: GeLee-Q <865038696@qq.com> Co-authored-by: Gao016 <yngao016@163.com> Co-authored-by: yzlnew <yzlnew@gmail.com> * fix: correct values in deepseek-v3-671b-eagle3.json Co-authored-by: GeLee-Q <865038696@qq.com> Co-authored-by: Gao016 <yngao016@163.com> Co-authored-by: yzlnew <yzlnew@gmail.com> * chore: update examples and templates for DeepSeek-V3 EAGLE-3 --------- Co-authored-by: chenyefei.cyf <chenyefei.cyf@U-9V5T77LW-2356.local> Co-authored-by: GeLee-Q <865038696@qq.com> Co-authored-by: Gao016 <yngao016@163.com> Co-authored-by: yzlnew <yzlnew@gmail.com>

* supoort thinking models update kimi-k2 and deepssek polish fix lint fix kimi-k2 and gpt-oss fix lint * update parse to handle the Boundary token --------- Co-authored-by: Shenggui Li <somerlee.9@gmail.com>

* add ds v3 * init * modify sglang fit deepseek * fix deepseek rparser * ulysses finish * ring offline finish * tmp * test pass * test fail * test * clean up * remove deepseek * clean up * clean up * - * - * format * fix unit test --------- Co-authored-by: Yu Feng <fengyufengyu@didiglobal.com> Co-authored-by: daiyajun <daiyajun@didiglobal.com>

* fixed templates * polish

* feat: make dataloader num_workers configurable and fix prefetch_factor issue 1. scripts/train_eagle3.py: - Added argument (default: 4) to replace the hardcoded value. - This allows adjusting worker count for low shared memory environments or debugging. 2. specforge/data/utils.py: - Fixed when is 0 by forcing to None. * Fix: add num_workers argument and fix dataloader bug --------- Co-authored-by: yeshihai <yeshihai@MBP-LX22VD4VGG-2240.local>

* added regenerated datasets * polish

Add --no-build-isolation flag to pip install command

Removed duplicate test run commands and unnecessary ls statements.

FrankLeeeee and others added 30 commits November 23, 2025 12:36

added sglang arguments (#317)

00edc17

unified benchmark scripts (#319)

72337ef

* unified benchmark scripts * polish

fixed data regeneration script (#321)

95cb2ae

fix ckpt dir check (#320)

f6ec513

Co-authored-by: baojiangnan <baojiangnan@kuaishou.com>

support gen hidden states use fp8 (#318)

d582d7d

add sglang args in gen hidden states polih polish polish

Add subset options for opc (#312)

34b5883

* Add subset options for opc * lint & cat datasets

Fixed the installation command

d960896

organized unit tests (#324)

1e3fb6e

fixed non-runnable examples (#322)

44409f6

* fixed non-runnable examples * polish * polish

merged data generation scripts (#323)

341abf5

Fix args type (#328)

ed30525

* fix missing import * fix-args-type

added autoflakes pre-commit hook (#327)

04a6bcf

fixed specforge imports (#332)

70f5187

bump to v0.1.1 (#330)

8dff2b7

Support more sampling params in data generation (#333)

9b05770

* support more sampling params * remove recommended * some comments * lint

Add qwen3-coder-30B-A3B-Instruct Eagle3 Training Script (#329)

b77e6f7

* Add examples of qwen3-coder-30B-A3B training script * tiny fix * Remove WANDB API key export from script to align with other examples Removed Weights & Biases configuration from script.

fix mmstart benchmrk (#334)

5c43694

updated benchmark docs (#340)

3e0cda0

* updated benchmark docs * polish * polish

grouped args for better reference (#343)

44d5c62

* grouped args for better reference * grouped args for better reference

added profiling (#344)

3bca52c

Feature/online train use hf backend optimize GPU usage (#346)

94de9f8

* feature: optimize online training use hf backend less GPU memory polish polish * polish

added model-download-dir (#347)

5c355b8

* added model-download-dir * polish

fix: is_running to get_run (#353)

c65a358

add default build_dataset_num_proc value (#354)

dc44caf

fixed kv head replication in qwen3 moe (#357)

9639a52

* fixed kv head replication in qwen3 moe * poliosh * poliosh

[Docs] add benchmark refer (#358)

e0625b0

* docs:add benchmark refer polish polish * polish

optimized sglang backend memory usage (#359)

e012016

* optimized sglang backend memory usage * poliosh

FrankLeeeee and others added 14 commits December 23, 2025 23:15

bump version to v0.2.0 (#386)

73e6f80

added dashboard link (#387)

e30518a

Support Qwen3,Qwen3-Next,Kimi-K2,Deepseek models template (#381)

280fab9

* supoort thinking models update kimi-k2 and deepssek polish fix lint fix kimi-k2 and gpt-oss fix lint * update parse to handle the Boundary token --------- Co-authored-by: Shenggui Li <somerlee.9@gmail.com>

fixed templates (#389)

5660635

* fixed templates * polish

corrected llama3 examples (#391)

4ac6bb7

added regenerated datasets (#395)

a686e3d

* added regenerated datasets * polish

fixed benchmark process termination (#394)

866ca44

added regenerated data processing for llama series (#396)

b7febe8

added specbundle to readme (#397)

886ab9c

Merge branch 'main' into modal-labs/flash_attn

10004e7

fix deps

6742725

yubofredwang mentioned this pull request Dec 29, 2025

feat: added low VRAM flash attention backend sgl-project/SpecForge#314

Closed

6 tasks

yubofredwang and others added 14 commits January 1, 2026 15:31

lint

5f18a47

bump flash-attn

d75ba86

update ci image

080bd28

test fa3

b849a2a

fix bug

1d6bbe5

fix bug

569f375

fix bug

498994f

Update Docker image version in test workflow

3e6e827

Update pip install command in test workflow

42fef31

Add --no-build-isolation flag to pip install command

Update pyproject.toml

79d9411

Add setuptools installation to workflow

18918fc

Update test.yaml

45cad19

Update test.yaml

3a95e87

Refactor test workflow to eliminate redundancy

021d8f2

Removed duplicate test run commands and unnecessary ls statements.

yubofredwang closed this Jan 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge main back to flash_attn #1

Merge main back to flash_attn #1

Uh oh!

yubofredwang commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Merge main back to flash_attn #1

Merge main back to flash_attn #1

Uh oh!

Conversation

yubofredwang commented Dec 29, 2025

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants